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We introduce a scheme for deriving an optimally-parametrised Langevin dynamics of few collective 
variables from data generated in molecular dynamics simulations. The drift and the position- 
dependent diffusion profiles governing the Langevin dynamics are expressed as explicit averages over 
the input trajectories. The proposed strategy is applicable to cases when the input trajectories are 
generated by subjecting the system to a external time-dependent force (as opposed to canonically- 
equilibrated trajectories). Secondly, it provides an explicit control on the statistical uncertainty of 
the drift and diffusion profiles. These features lend to the possibility of designing the external force 
driving the system so to maximize the accuracy of the drift and diffusions profile throughout the 
phase space of interest. Quantitative criteria are also provided to assess a posteriori the satisfiability 
of the requisites for applying the method, namely the Markovian character of the stochastic dynamics 
of the collective variables. 



With modern molecular dynamics approaches it is pos- 
sible to follow the dynamical evolution of systems com- 
posed by a large number of particles. The resulting tra- 
jectory, obtained through numerical integration of the 
equations of motion, corresponds to a discrete trace in a 
phase space of very high dimensionality. In order to an- 
alyze this trajectory it is customary to monitor the time 
evolution of only a limited number of collective variables, 
also referred to as reaction coordinates, or order parame- 
ters. The latter are explicit functions of the microscopic 
degrees of freedom of the system for which they ought to 
provide a viable coarse-grained description. The system 
dynamics and equilibrium properties are then character- 
ized in terms of these variables alone. 

This dimensional reduction strategy, which has ubiqui- 
tous applications in physics and chemistry and biophysics 
(see e.g. refs. [H, 0, lEI3l)j has its most-general formula- 
tion in the Zwanzig-Mori projection procedure^. This 
scheme is of fundamental conceptual importance given its 
general formal applicability to systems whose evolution is 
governed by a Liouvillc operator. In such contexts it can 
be demonstrated that the time evolution of the collective 
variables is describable by a stochastic dynamics with a 
non-trivial memory kernel. Owing to the formidable diffi- 
culties posed by the a priori determination of the memory 
kernel; how to devise practical and general algorithms for 
carrying out the dimensional reduction remains an active 
area of research. 

Several approaches have been developed over the years 
to perform dimensional reduction in specific contexts 
0, i, i, M, [O, E [H, [ii, [13. The validity of the 
overdamped Langevin dynamics is very commonly as- 
sumed a priori for describing the dynamical evolution 
of the reduced system. Consequently, the system free 
energy landscape and the diffusion coefhcient profile 
can be expressed in terms of the Kramers-Moyal coef- 
ficients calculated a posteriori from extensive dynami- 



cal trajectories [6[. An interesting illustration of this 
strategy is provided in ref. [l^l where Kopelevich et 
al. estimate the Langevin drift and diffusion coefficients 
from short trajectories with different initial conditions. 
In other commonly-employed approaches the Langevin 
equation parameters are derived from a maximum like- 
lihood principle 0, [13, [l^l- Specifically, the free energy 
and diffusion coefficient profiles are chosen in such a way 
that the time evolution of the collective variables actu- 
ally observed in the molecular dynamics trajectory has 
the highest realization probability. The latter is quan- 
tified by computing the Onsager-Machlup action along 
the trajectory, see the work of Gullingsrud et al. Q- As 
shown by Hummer (lo| , this scheme can also be general- 
ized by allowing for a position-dependent diffusion coef- 
ficient. Another powerful related approach is the one of 
Horenko et al. jl3| where the evolution of a system is as- 
similated to a diffusive process in a scries of harmonic free 
energy wells. Transitions between the wells are described 
as discontinuous "jump" processes, with a suitable tran- 
sition probability per unit time. The parameters of the 
model (the position and width of the harmonic wells, the 
diffusion coefficients in the wells and the jumping rates) 
are also derived a posteriori from a maximum likelihood 
approach. 

Most available approaches have been formulated and 
designed to be applied to take as input equilibrated 
(canonical) trajectories. Recent advances in thermody- 
namic sampling techniques, however, stimulate the for- 
mulation of more general approaches applicable to sys- 
tems subjected to external time-dependent biases. Large 
systems with a corrugated energy landscape would spon- 
taneously evolve very slowly and the introduction of suit- 
able external forces provides an effective means of driv- 
ing the system through the reduced phase space. This 
is commonly exploited in several thermodynamic sam- 
pling techniques, such as steering [l6|, lo c al- ele vat ion p7j 
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adaptive force biaslTql . flooding [l^, Wang-Landau|2q| 
and metadvnamics [211. [2^ . To the best of our knowl- 
edge, the method of GuUingsrud et al. is the only 
available maximum-likelihood approach which is appli- 
cable to systems (whose diffusion coefficient is known a 
priori) subjected to externally-applied biases. 

Building on the previous studies mentioned above, we 
here formulate and apply a novel maximum-likelihood 
scheme that allows to recover efficiently the equilibrium 
and dynamic properties of the reduced system even when 
subjected to an externally-applied time-dependent force. 
The variational approach addressed in this study comple- 
ments the advantages of the strategies in refs. and 
0] as it allows recovering a posteriori a non-constant dif- 
fusion coefficient profile while accounting for externally- 
applied forces. 

The method provides not only the drift and diffusion 
coefficients for the system (in one or more collective vari- 
ables) but also an estimate of their statistical errors. 
For all these quantities we derive expressions that are 
straightforwardly calculated by averaging suitable ob- 
scrvablcs along the dynamical trajectories. The possibil- 
ity to control the error on the drift and diffusion terms 
of the reduced system opens the possibility to design the 
applied external bias so to achieve a pre-assigned profile 
of statistical uncertainties for the quantities of interest. 

In the following we shall first derive the maximum- 
likelihood expressions for the drift and diffusion coeffi- 
cient profiles and their errors. The advantages and range 
of applicability of the method are finally illustrated and 
discussed for a specific system, namely the problem of 
looping of a self-avoiding polymer chain in a crowded 
medium. 

It should be remarked that the validity of the approach 
relies crucially on an appropriate choice of the collective 
variable whose dynamics, sampled at appropriate time 
intervals, must have a Markovian character. This re- 
quirement is not necessarily fulfilled by an arbitrarily 
chosen variable. Indeed, even if the trajectory of the 
system is generated by a Markovian process (e.g. molec- 
ular dynamics with Langevin thermostats), the dynamics 
of a single, projected variable cannot be expected to be 
Markovian too[5|. Though no simple a priori criteria 
can be adopted for a good choice of collective variables 
we discuss, in section Hi Dl how quantitative schemes can 
be introduces for verifying a posteriori if a given time 
series is reliably described by a Morkovian process and if 
the overal approach can consistently be applied. 



OPTIMAL LANGEVIN DESCRIPTION OF A 
STOCHASTIC DYNAMICS PROCESS 



We consider a system with several microscopic degrees 
of freedom and whose salient properties are described 
through a much smaller number of collective variables 
(CVs), Si=i,...Ar, chosen a priori and defined in terms 



of the microscopic variables. The system is assumed to 
evolve in time under the combined action of two kind of 
forces: (i) the thermodynamic force, tending to establish 
the canonical equilibrium associated to a given temper- 
ature T, and (ii) a time-dependent external force acting 
on the collective variables. In the following we shall in- 
dicate with 9i(t) the instantaneous external force conju- 
gated to the ith collective variable. A prototype system, 
which will be discussed later, is constituted by a poly- 
mer chain where the fundamental degrees of freedom are 
the centers of its spherical monomers. A single collective 
variable will be used, namely the polymer end-to-end dis- 
tance. The polymer dynamics is controlled by both the 
thermal buffeting of the surrounding solvent molecules 
and by an externally-controlled stretching force applied 
to the chain ends. 

The evolution of the system is followed at the level of 
the collective variables through an equispaced time se- 
ries T = {s (0) , s {dt) , . . . , s (t) , . . .}, where s (t) denote 
the array of the instantaneous CVs values, Si=i,...,jv (0- 
The objective is to take the discrete trajectory T and 
the accompanying time series of the externally-applied 
forces, {9 (0) , 9 (dt) , . . . ,6 (t) , . . .}, as the sole inputs for 
deriving the best parametrization of the system prop- 
erties within a Langevin description of the CVs evolu- 
tion. In particular, the aim is to recover the thermody- 
namic forces and diffusion matrices of the isolated sys- 
tem for a wide range of CVs values by recording how the 
externally-driven system evolves. 

The extraction of the optimal Langevin parametriza- 
tion is carried out within a maximum likelihood ap- 
proach, a framework profitably used in other previous 
approaches 0, HO; [III- We start by assuming that for 
a suitable choice of the discretization time interval, dt, 
the CVs evolution is describable as a Markovian pro- 
cess. The probabifity to observe a specific trajectory T 
is accordingly 
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where dsi (t) = Si{t + dt) — Si (t) and tt (s(t),c?s (t)) is the 
probability of the elementary step. 

For non-externally-driven systems, described by a single 
collective variable, s, subject to an overdamped Langevin 
evolution with constant diffusion coefficient D in a free 
energy landscape, J-{s), the probability of the elemen- 
tary step has a simple Gaussian form: tt (s{t),ds) oc 

exp (ds + [DdT (s)) dtf 0]. For simplicity 

of notation in the previous expression and in the follow- 
ing it is implied that the free energy T is expressed in 
units of the thermal energy, KgT. For the case of sev- 
eral collective variables and if the diffusion coefficients 
depends on the CVs themselves, the previous expression 
generahzes to@: 



TT (s, ds) (X 
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where a summation of repeated indexes is implied and 

X^ (t) = ds, (t) + (Aj {s) d,T (s) - 5, dt 

= dsi (t) — Vi (s) dt. (3) 

where Vi{s) is the drift field @. Without loss of gener- 
ality, the diffusion matrix is assumed to be symmetric: 
= Dj,{s)^]. Eq. H]), © and ©, for a given 
choice of D (s) and v (s) , allow computing the probabil- 
ity of a trajectory T. The stochastic differential equation 
leading to Eq. ^ and ^ is given by 



ds, (t) = V, {s {t))dt + V2 D^^^is (t)) dWj (t) (4) 

where {dWi {t)} is a TV— dimensional Wiener process. 
In the presence of the external force 9i (i), Eq. ^ is 
modified as follows: 
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S log P[T] 

-^^D-' (s) D-lis)xi it) x,n (t) + \d-' is) XI (t) e, it) 



+\d-i'{s)xi {t)e^ (t) 
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where the condition Dy = Dji has been enforced while 
taking the variation with respect to D. Introducing 

the notation (a) ^ — »-°(*) ) obtain the foUow- 

ing equations, that, in general, have to be solved self- 
consistently: 



X^{t) = ds,{t) + {D,j{s)djT{s) 
~Da [s it))9i (t) 
^ ds, (t) ~ Vi (t) dt - Da [s {t))ei (t) dt 



djD,,{s))dt 



(5) 



The extra term would contribute a term ^Dij (s) 0j (t) dt 
in Eq. g]). 

Eq. H]), supplemented by the relations of Eq. ([2]) and jS]), 
coincides with the expression of the Onsager-Machlup 
action 0, namely with the probability to observe the tra- 
jectory T given the known diffusion and drift terms gov- 
erning its langevin evolution. In the present context , 
the probability P[T] of Eq. [T]can also be profitably in- 
terpreted from a complementary perspective. In fact, for 
a given trajectory, P[T] can be considered as a likelihood 
functional which characterizes the non-externally-driven 
system in terms of v and D . In this approach, extrem- 
izing Eq. ([T]) provides the unknown drift and diffusion 
terms [v and D) yielding the highest possible probability 
for the given observed trajectory, T. 

Notice that, at variance with other approaches (tI [lOt. 
the likelihood of the trajectory is here maximized with 
respect to the drift field Vi{s) and not the free energy 
F{s). An advantage of this alternative approach is that 
the solution can be expressed as an explicit average com- 
puted over the trajectory, at least in the one dimensional 
case (see Eq. ([TU])'). The disadvantage is that, when two 
or more collective variables are used, the optimal v and 
D are not guaranteed to yield an equilibrium probability 
measure^, but only to a stationary one. This condi- 
tion should indeed be verified a posteriori and provides 
a further consistency criterion for the viability of the ap- 
proach. 

The cardinal variational equations for the drift and 



diffusion terms are ^^r^-rx-^ 



some algebra one obtains: 



and = 0. After 
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dt{{ej,0k)-{e.p){ek)) (9) 



Eqs. ([8]) and ([9]) make possible to estimate D (s) and 

V (s) also from trajectories obtained in the presence of 
time-dependent forces acting on the system. It is note- 
worthy that Eqs. ^ and © tie the optimal estimates of 

V and D to suitable averages made on the trajectory. On 
one hand this lends to a straightforward numerical im- 
plementation of the scheme. On the other, it highlights 
a key difference between the Eqs. ([8]) and ([9|) for driven 
systems and the Kramers-Moyal coefficients of first and 
second order which connect the Langevin and Fokker- 
Planck descriptions of the system evolution. At variance 
with the spirit of the averages in the above equations, in 
fact, the time-dependent Kramers-Moyal coefficients are 
defined as averages over the Wiener process at a given 
time and value of the CV's. It is, however, clear that 
for a non-externally-driven system, where 9{t) =0 at all 
times, the averages in Eqs. ^ and ^ coincide with those 
over independent realizations of the noise, and hence v 
and D match the time- independent Kramers-Moyal co- 
efficients: 



(s) 



dt 



(dsi 



(dsidsj) — {dsi){dsj) 
2di 



In one dimension (namely for = 1) Eq. ([9]) can be ex- 
plicitly solved also for 9^0. Since D must be positive- 
defined, the second order Eq. ^ admits a single physi- 
cally viable solution: 
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dt 







Dis) 



(10) 

An important payoff of the maximum-likelihood ap- 
proach is that it leads straightforwardly to estimating 
the uncertainty on the inferred values of v* and D*^ (a 
star is used to denote the fact that these values maximize 
P[T]), associated to the limited statistics inherent in any 
trajectory with finite duration. To do so we consider the 
expansion of P[T] around the maximum retaining terms 
up to quadratic order. Introducing an A'^ + N'^ dimen- 
sional vector y (s) = (. . . , Ui (s) , . . . , Dij (s) , . . .), the er- 
ror on Hi at s, {yt (s)), can be estimated as the stan- 
dard deviation of yi{s) from its best available estimate, 
y*{s). This is given by cr^ (y^ (s)) = - {A~^) where 



(s) 



For 1 we have 
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with A^ (s) = J2t <5 (s — s (t)). Thus, the uncertainties on 
V and D are given by 



a' {v{s)) 
a' {Dis)) 



2 D 



1 + dtD i 



N{s)dtl + dtD{{e^) - (61)2) 

2 D^ 
N{s)l + dtD{{e^) - (6')2) 



(12) 



As intuitively expected, the quadratic error on v and D 
in s are inversely proportional to N (s), that is the num- 
ber of times the trajectory visited s. For a trajectory 
generated by an ordinary dynamics on a system whose 
reduced free energy exhibits several minima saddle points 
etc., this error will be highly non- uniform. Even if transi- 
tions from the various basins are observed, the statistical 
accuracy on v and D in the transition region will degrade 
very rapidly with the barrier height. This limitations can 
be overcome with the aid of expression (|12p and a pre- 
liminary rough knowledge of the free energy profile. In 
this case it is conceivable to design the application of the 
time-dependent external forces so to achieve an approx- 
imately uniform coverage of the phase space of interest. 
It should be remarked that Eq. quantifies the sta- 
tistical uncertainties on v and D and does not take into 
account possible systematic errors deriving from the non- 
Markovian nature of the process. These aspects will be 
discussed in more detail in Section II. E. 



II. APPLICATION: LOOPING OF A POLYMER 
CHAIN 



In the following we shall discuss the application of 
the above strategy to the problem of loop formation in 
a model polymer chain fiuctuating in a solvent rich in 
crowding molecules. The polymer model considered here, 
follows the one introduced in Ref. [2^ to study how 
the polymer looping kinetics is affected by the crowding 
agents The novel question addressed here regards 

the possibility to describe the evolution of the polymer 
end-to-end distance, s, by means of a Langevin equation. 
It will be shown that, for a suitable choice of the time 
interval with which the original trajectory is sampled, a 
Langevin description for the evolution of s is, in fact, pos- 
sible. Interestingly, despite the simplicity of the system 
and its formulation, both the optimally-recovered drift 
and diffusion terms have a non-trivial dependence on s. 

The model polymer consists of n spherical beads of 
radius R interacting via the following potential energy 
term: 



V 



ei > e 



-a{dij-2R) 



^2EMl-(^)^] (13) 



where the i and j denote the sequential indexing of the 
n chain beads. The first term in expression (|13p en- 
forces the self-avoidance of the chain, while the second 
provides the attractive interaction between consecutive 
beads, thus enforcing the chain connectivity as in the 
FENE modeljl^. The model parameters are exactly 
those introduced in Ref. [1^ to describe an eukariotic 
chromatin fiber, whose effective diameter and persistence 
length are both ~ 25 nm [H]. Specifically, ei and €2 are 
respectively 1 and 70 units of thermal energy, kbT, a = 4 
nm~^, and R = 12.5nm is the bead radius. At the chosen 
temperature, T = 300 K, the interplay of the two terms 
in ensures that distance between consecutive beads 
fiuctuates around the nominal value of 25 nm by only 
about 0.5 nm. The mass of the beads is calculated from 
the typical densities of biopolymers, p = 1.35 g/cm'^ [27| . 

As anticipated, the motion of the chromatin fiber 
is assumed to occur in a medium crowded by other 
biomolecules (proteins, RNA etc.) which are simply 
modeled as monodispersed globular particles of radius 
r ~ 2.5nm which altogether occupy 15% of the system 
volume. The crowding agents are not modelled explic- 
itly but rather through the Asakura-Oosawa (AO) mean- 
field approach ^281]. This approach exploits the small- 
ness of the crowding agents compared to the chain beads, 
to introduce the effective self-attraction of the polymer, 
known as depletion interaction, induced by the hard-core 
repulsion with the crowding agents. This additional self- 
interaction is described by the following potential energy 
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term: 

(14) 

where dij = 2r + d^j — dij, Ay = — and the 
step function 9 ensures that the AO depletion interac- 
tion vanishes at distances > d^ + 2r. The dynamics of 
each bead (subjected to a Stokes-Einstein friction appro- 
priate for molecular crowding |23|) was followed within a 
under-damped Langevin scheme |29[ with an integration 
time step of 1 ps, appropriate to resolve the decay of the 
correlation of the fastest-relaxing degrees of freedom of 
the system, the beads velocities. Part of the results pre- 
sented below are obtained analyzing the end-to-end dis- 
tance with a sampling time intervals of 200 ps or larger. 
The evolution of this quantity occurs over a time scale 
much larger than the relaxation time of the beads ve- 
locities and hence, for reasons of efficiency, was obtained 
through the over-damped Langevin scheme p9j with an 
integration time step of 15 ps. The equivalence of the 
under- and o?;er-damped schemes was explicitly verified 
by comparing the estimates of the drift and diffusion co- 
efficients obtained by processing runs covering 15 ms. 

We first followed the evolution of the isolated system 
and used the recorded trajectory for the analysis pre- 
sented in the previous section. We considered a sin- 
gle collective variable, namely the end-to-end distance, 
s (r) = ||rAr — ri||. As anticipated in the introduction, 
although the evolution of the original n-particle system is 
Markovian, the dynamics of s (r) might not be necessarily 
so. This issue will be discussed in detail in Section fllD I 
We set the chain length, n, equal to 5; this made possible 
to collect extensive trajectories with an affordable com- 
putational effort and hence validate, at least in part, the 
variational Langevin description by comparing the pre- 
dicted equilibrium properties against data obtained by 
straightforward histogram techniques. 

A. Equilibrium properties of the optimal model 

Starting from a random configuration of the polymer 
we have initially followed its underdamped Langevin dy- 
namics (in the absence of any external force) over a time 
span of 180ms. As illustrated by the sample time se- 
ries of s, shown in Fig. [TJ upper panel, this time span 
is much larger than the typical looping/unlooping times 
of the chain, and hence is a sufficient guarantee of equi- 
libration of the system properties. This trajectory was 
consequently used to estimate v (s) and D (s) and their 
errors from the averages in Eqs. ([5]), Q and p^ . In 
order to compute these averages it is necessary to choose 
the value for dt entering in the Langevin equation. The 
correct dt has to satisfy two conditions. First of all, dt 
must be so large that the underlying process can be con- 
sidered, at least approximately, as Markovian. At the 
same time, dt has to be sufficiently small that the typical 




time [ms] 



FIG. 1: Evolution of the end-to-end distance in the absence 
(upper panel) and presence (lower panel) of an external force. 

change of reaction coordinate over such time scale docs 
not reflect in a significant variation of the free energy and 
diffusion coefficient. 

This second condition can in principle be relaxed if the 
transition probability of the underlying Langevin process 
was known for an arbitrary dt [l3| . Exact expressions for 
finite-time transition probabilities are however available 
for very special potentials, notably harmonic ones fl3'|. In 
the present case, the free energy shape is not specified a 
priori, and hence it is necessary to use the approximation 
of Eq. which is valid for small dt. Thus, the two 
conditions specified above may be potentially mutually 
exclusive. 

For the simple polymer model described in this work 
it is possible to compute accurate equilibrium and ki- 
netic quantities directly from extensive simulations, and 
choose a posteriori the value of dt in order to obtain a 
Langevin model that reproduces them in as faithfully as 
possible. An alternative practical criterion for choosing 
this parameter without benchmarking the Langevin pre- 
dictions against the exact results will be described in the 
following. 

In Fig. [21 we compare the "true" canonical free energy of 
the system, F(s), obtained through the histogram of the 
values of s recorded in a long trajectory, with the one ob- 
tained computing D and v by equations ([H]), (HI, and in- 
tegrating numerically the stochastic differential equation 
formally given in eqn. ([4]). Specifically, the discrete-time 
evolution of s implemented numerically was: 



s{t + dt) = sit) - v*{s)dti + ^2D*{s)dti ■ r]{t) (15) 

where r]{t) is drawn from a Gaussian distribution with 
unit variance. In order to avoid systematic error deriv- 
ing from a finite integration time, the time increment dti 
is much smaller than dt. Qualitatively, the free energy 
profiles in Fig. [5] show two minima: one (denoted by 
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FIG. 2: Free energy. Thick line: profile obtained from the 
the 180 ms trajectory. Other curves: free energy profiles re- 
constructed from the optimally determined v{s) and D{s), for 
various choices of dt. The free energies are from the histogram 
of s as as F(s) = -ilogAf(s) = -ilog J dtS {s - s (t)) . The 
difi'erent curves are not distinguishable on the scale of the fig- 
ure. Inset: zoom of the region of the first minimum, showing 
that small deviations from the true profile are observed for 
dt > 50ns. 



L, for looped, in the following), for s < 27, correspond- 
ing to a state in which the two ends of the polymer are 
in contact; the other (denoted by U, for unlooped), for 
s > 27, corresponding to a state in which the two ends 
of the polymer are far. As we already remarked, D and 

V depend on the choice of the dt. The different curves 
are obtained solving the Langevin equation using D and 

V determined using with dt =4.5, 18 and 45 ns. All the 
choices lead to approximately the same profile. The pro- 
file starts degrading only for dt > 90ns (data not shown) . 

This provides an a posteriori demonstration that D (s) 
and V (s), if computed with an appropriate choice of dt, 
are consistent with the true equilibrium properties of the 
system. 



B. Kinetic properties of the optimal model 

A complementary, stringent, test of the viability of the 
recovered v* and D* profiles can be performed by in- 
vestigating kinetic-related properties of the system. The 
trajectory obtained from the Langevin model (|15p was 
processed to calculate the average residence time in the 
bound state, namely the time ti required by the system 
entering in the L state to escape from the well and en- 
tering in the U state. The normalised distributions of 
Ti obtained from the model Langevin equation (jl5p and 
from the original trajectory are shown in Fig. [31 
It can be seen that the two sets of distribution profiles 
are very consistent, thereby indicating the viability of the 
model Langevin description also for the kinetic system 
properties over time-scales much larger that dt. 




lc+05 2C+05 3C+05 4c+05 5c+05 6c+05 7c+05 



time [ns] 

FIG. 3: Normalized histogram of the residence times in the 
L state, computed from the real dynamics and from the tra- 
jectories generated by the optimal model. The L state cor- 
responds to whenever s is smaller than 26, and enters the U 
state whenever s is larger than 34. 




FIG. 4: D(s) and v(s) evaluated with Eqs. ^ and Q on an 
unbiased trajectory. Thick and thin lines are used for quanti- 
ties obtained from trajectories of duration equal to 10ms and 
0.5ms, respectively. The filled gray boxes represent the error 
bars calculated from Eqs. (|12p for the 0.5 ms-long case. 



C. Estimating the error 

The validity of equation for estimating the er- 
ror has been tested by comparing D (s) and v (s) com- 
puted from trajectories of different length. The results 
are shown in Fig. [J] The thick line represent D (s) and 
V (s) computed using all the 180 ms of the trajectory. As 
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FIG. 5: Free energy profiles. Thick line; profile obtained from 
the histogram of the trajectory of the non-externally-driven 
system. Dashed line: free energy profile obtained from the 
optimal Langevin description applied to data recorded in the 
presence of the harmonic time-dependent external force. The 
continuous line provides, for comparison the "free-energy" 
profile obtained directly from the histogram of s recorded in 
the run subject to the external force. 



we anticipated, even if the chosen collective variable is 
very simple, D (s) shows significant variations as a func- 
tion of s. The thin black line corresponds to the two 
quantities evaluated using a much shorter trajectory of 
9 ms. The solid gray blocks are the estimated errors as 
given by Eqs. (dH). The thin black lines falls well within 
the estimated error of the 180 ms result, showing that 
Eqs. ((T2)) provide viable estimates for the statistical un- 
certainties. 



D. Optimal model of the non-externally-driven 
system from an out-of-equilibrium trajectory 

A major advantage of the approach presented here is 
that it can be applied also on trajectories generated un- 
der the action of an external time-dependent forces. To 
illustrate this point we now consider a trajectory of the 
system under the action of an external force of the form 



d 
ds 



1 2 

-k (s - Srest {t)) 



(16) 



If this force is applied, the system is biased towards fol- 
lowing Srest (t). 

Although the externally applied force can be chosen a 
priori so to optimize the statistical uncertainty on the D 
and V profiles we shall consider the very simple case of an 
harmonic force derived from a harmonic restraining po- 
tential whose center is scillating between Smin and Smax 
with a period T = Ims: 



1 

^rest \J') — '^min H~ 77 (^^i 







t 

27r- 
T 



1 



The values Smin and Smax are set equal to 21 and 104 
nm, which hence cover a range of the original parameter 
space wide enough to encompass both minima of the free 
energy. A sample trajectory obtained under the action of 
this bias is shown in Fig. [Tl lower panel. As visible, the 
added external force influences heavily the evolution of 
the system which, in fact, exhibits a noisy harmonic mod- 
ulation. The external bias is so strong that a direct use of 
the the recorded s trajectory to compute the system free 
energy from the usual histogramming procedure would 
lead to a completely wrong free energy profile (shown 
with a thin continuous line in Fig. (S]). By contrast, the 
use of the optimal Langevin scheme derived above is very 
effective in subtracting the effect of the bias and yield a 
free energy profile that is entirely compatible with the 
true one. Notice that the bias subtraction docs not ex- 
ploit the knowledge of the instantaneous values of the 
external force, but relies merely on the knowledge of the 
time-averaged of the bias as a function of the collective 
variable. 



E. Validity of the Markovian approximation and 
optimal choice of dt 

As already mentioned, the choice of the time lag at 
which the data arc recorded, dt, is essential for the vi- 
ability and consistency of the proposed method, partic- 
ularly regarding the Markovian character of the chosen 
collective variable. If dt is too small, Eq. ^ is not ade- 
quate for describing the evolution of s, as the noise term 
(reflecting the influence of the "integrated" degrees of 
freedom) would have a sizable autocorrelation time. On 
the other hand, if the time lag is too large, there would 
be prominent variations of the free energy and diffusion 
coefficient evaluated for two "consecutive" positions, s{t) 
and s{t + dt). This would invalidate the assumption, see 
Eq. pS]) , that the force acting at time t depends only on 
the instantaneous position, s(t). 

If the recovered diffusion coefficient is constant in pa- 
rameter space and the underlying free-energy profile is 
harmonic, the Markovian character of the collective vari- 
able can be established by verifying the exponential decay 
of its autocorrelation function. More sophisticated pro- 
cedures must be followed to compute the memory kernel 
when the drift and diffusion terms do not have a struc- 
ture as simple as the one mentioned aboveQ, [lB|- Here 
we show a series of simple quantitative tests that can be 
used to assess the Markovian character of the collective 
variable on the time-scale defined by the sampling inter- 
val dt. These tests can be easily used to find an optimal 
value of dt. 

To this purpose we solve Eq. ^ with respect to the 
noise dWj (t) : 

dW, it) = ^A^'^' {ds, it) - V, is it)) dt ) . (18) 



(17) Using the estimates v (s) and D (s) given by Eqs. ^ 
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time / dt 



Reference Gaussian 

1-5 - o dt=2ns 




FIG. 6: Upper panel: Time correlation function Cdt (t) = 
{dW {t) dW {t + t)) as a function of r/dt for different 
choices of dt. Lower panel: Probability distibution of dW/\/dt 
for different choices of the time lag dt. dW is computed from 
Eq. (|18|l and the average is restricted to values of t in which 
27.5 < a{t) < 30.5 (approximately the region of the barrier). 
Inset: normalized kurtosis, k = (< dW^ > —3 < dW^ >^ 
=< dW^ > /dt^ — 3, as a function of the time lag dt. 

and one can evaluate dWj (t) along the trajectory 
{. . . , s (t) , s {t + dt) , . . .} . It is readily seen that dWj (t) 
from Eq. satisfies: 

(dW^it)) = 
{dW, (t) dWj {t)) = dt5^j . 

Yet, the internal consistency of the procedure requires 
that dWj (t) is uncorrclated at different times and that 
its probability distribution is Gaussian. These two prop- 
erties are not enforced in the optimization procedure, and 
they can hence be used to validate, a posteriori its appli- 
cability. 

As a first step in the validation we calculate the au- 
tocorrelation function of the noise for given values of 
the time lag dt, Cat (r) = ^ {dW (t) dW {t + r)). The 
averages were calculated from a single underdamped 
Langevin evolution of the system and the sampling time- 



lag, dt, ranged between 0.001 and 2000 ns. The resulting 
autocorrelations as a function of r/dt are shown in Fig. [51 
upper panel. The trends should be compared with the 
step character of a meniorylcss noise: Cdt (0) = 1 and 
Cdt (t) = for all t ^ 0. This limiting behaviour is well- 
approximated for large dt, as the autocorrelation drops 
almost immediately to zero {Cdt (dt) ~ 0.01 for dt ^ 1 
ns). A slow decay is, instead observed for smaller dt's, 
indicating that the underlying stochastic process derives 
from a correlated noise. 

By inspecting suitable properties of the noise dWj (t) 
of Eq. p8)) it is further possible to highlight the limita- 
tions of excessively-large values of dt. A valuable indica- 
tor is provided by the Gaussian character of the distri- 
bution of the instantaneous noise amplitudes ^^^=^. The 
histograms in Fig. [51 lower panel, indicate noticeable de- 
viations from Gaussianity for dt larger than 500 ns. Also 
the normalized kurtosis (see Fig. [51 lower panel, inset) 
becomes significantly different from zero for dt > 100 ns. 

It emerges that dt must be chosen so to satisfy simul- 
taneously the criteria for the memoryless character of the 
noise and the Gaussianity of its probability distribution. 
For the specific system considered here, it can be verified 
that using dt between 1 and 100 ns provides a viable and 
consisent Langevin description of the system. In fact, 
in these conditions, the autocorrelation function of the 
noise has the correct form and, at the same time, the 
probability distribution of dW is very close to a Gaus- 
sian. 



III. CONCLUSIONS 

We have presented an optimal scheme for describing 
a posteriori the dynamics of a given system through 
the Langevin evolution of few collective variables. The 
scheme lends to a straightforward numerical implemen- 
tation. It allows one to extract not only the drift pro- 
file but also the diffusion coefficient which may both de- 
pend on the collective variables. The proposed method- 
ology allows to control the statistical uncertainty affect- 
ing the calculated drift and diffusion profiles. Secondly, 
the drift and diffusion terms of the non-externally-driven 
system can be recovered from trajectories recorded in the 
presence of an externally-applied force. This second as- 
pect appears particularly important as the external time- 
dependent force can be designed to optimize, for a given 
duration of the system evolution, the exploration of the 
phase space and control the statistical uncertainty on the 
parameters of the Langevin equation. The viability of 
the scheme was illustrated by applying it to the looping 
kinetics of a model polymer system. As several other 
approaches, the proposed one can be applied to systems 
describable by an overdamped Langevin dynamics. We 
plan to explore the feasibility of extending the present 
framework to the case of CV evolving under the action 
of non-trivial memory kernels. This would be a particu- 
larly important avenue for characterizing the salient dy- 
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namical features of biomolccules which is presently at- 
tracting considerable attention due to the large range of 
time scales that these molecules exhibit in their internal 
dynamics [1, M, M, M, M,M,^- 
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